Ranking-Constrained Keyword Sequence Extraction from Web Documents
نویسندگان
چکیده
Given a large volume of Web documents, we consider problem of finding the shortest keyword sequences for each of the documents such that a keyword sequence can be rendered to a given search engine, then the corresponding Web document can be identified and is ranked at the first place within the results. We call this system as an Inverse Search Engine (ISE). Whenever a shortest keyword sequence is found for a given Web document, the corresponding document can be returned as the first document by the given search engine. The resulting keyword sequence is search-engine dependent. The ISE therefore can be used as a tool to manage Web content in terms of the extracted shortest keyword sequences. In this way, a traditional keyword extraction process is constrained by the document ranking method adopted by a search engine. The significance is that the whole Web-searchable documents on the World Wide Web can then be partitioned according to their keyword phrases. This paper discusses the design and implementation of the proposed ISE. Four evaluation measures are proposed and are used to show the effectiveness and efficiency of our approach. The experiment results set up a test benchmark for further researches.
منابع مشابه
Toward Network-based Keyword Extraction from Multitopic Web Documents
In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction. Texts, collected from different web sources (portals, forums), are represented as directed and weighted co-occurrence complex networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within the sentence. We ...
متن کاملToward Network-based Keyword Extraction from Multitopic Web Documents
In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction. Texts, collected from different web sources (portals, forums), are represented as directed and weighted co-occurrence complex networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within a sentence. We te...
متن کاملGraph-Based Keyword Extraction for Single-Document Summarization
In this paper, we introduce and compare between two novel approaches, supervised and unsupervised, for identifying the keywords to be used in extractive summarization of text documents. Both our approaches are based on the graph-based syntactic representation of text and web documents, which enhances the traditional vector-space model by taking into account some structural document features. In...
متن کاملRelevant Pages in semantic Web Search Engines using Ontology
In general, search engines are the most popular means of searching any kind of information from the Internet. Generally, keywords are given to the search engine and the Web database returns the documents containing specified keywords. In many situations, irrelevant results are given as results to the user query since different keywords are used in different forms in various documents. The devel...
متن کاملOntology driven Pre and Post Ranking based Information Retrieval in Web Search Engines
With the tremendous growth of World Wide Web, it has become necessary to organize the information in such a way that it will make easier for the end users to find the information they want efficiently and accurately. This requires a pre-ranking of the underlying similar documents after the formation of the index. Thereafter the ranking of the search results in response to a query takes place wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009